A Study of Tones and Tempo in Continuous Mandarin Digit Strings and Their Application in Telephone Quality Speech Recognition1
نویسندگان
چکیده
Prosodic cues (namely, fundamental frequency, energy and duration) provide important information for speech. For a tonal language such as Chinese, fundamental frequency (F0) plays a critical role in characterizing tone as well, which is an essential phonemic feature. In this paper, we describe our work on duration and tone modeling for telephone-quality continuous Mandarin digits, and the application of these models to improve recognition. The duration modeling includes a speaking-rate normalization scheme. A novel F0 extraction algorithm is developed, and parameters based on orthonormal decomposition of the F0 contour are extracted for tone recognition. Context dependency is expressed by “tri-tone” models clustered into broad classes. A 20.0% error rate is achieved for four-tone classification. Over a baseline recognition performance of 5.1% word error rate, we achieve 31.4% error reduction with duration models, 23.5% error reduction with tone models, and 39.2% error reduction with duration and tone models combined.
منابع مشابه
A study of tones and tempo in continuous Mandarin digit strings and their application in telephone quality speech recognition
Prosodic cues (namely, fundamental frequency, energy and duration) provide important information for speech. For a tonal language such as Chinese, fundamental frequency (F0) plays a critical role in characterizing tone as well, which is an essential phonemic feature. In this paper, we describe our work on duration and tone modeling for telephone-quality continuous Mandarin digits, and the appli...
متن کاملModeling Lexical Tones for Mandarin Large Vocabulary Continuous Speech Recognition
Modeling Lexical Tones for Mandarin Large Vocabulary Continuous Speech Recognition
متن کاملLarge vocabulary Mandarin speech recognition with different approaches in modeling tones
Large vocabulary continuous Mandarin speech recognition has been an important problem for speech recognition researchers for several reasons [1], [3]. First of all, it is a tonal language that requires special treatment for the modeling of tones. There are five tones in Mandarin which are necessary to disambiguate between confusable words. Secondly, the difficulty of entering Chinese by keyboar...
متن کاملIdentifying Dialects of German from Digit Strings
At Eurospeech 97 we presented a perception experiment on identifying regional variants of High German from digit strings in telephone speech (Draxler, Burger 1997). This experiment has been modified as follows: i) use of high quality speech recordings from the RVG1 (Burger, Schiel, 1998) corpus instead of SpeechDat telephone speech, ii) a geographically precise dialect determination of the spea...
متن کاملConnected Digit Recognition Experiments with the OGI Toolkit's Neural Network and HMM-Based Recognizers
This paper describes a series of experiments that compare different approaches to training a speakerindependent continuous-speech digit recognizer using the CSLU Toolkit. Comparisons are made between the Hidden Markov Model (HMM) and Neural Network (NN) approaches. In addition, a description of the CSLU Toolkit research environment is given. The CSLU Toolkit is a research and development softwa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998